feat(python): expose register_table_function for Paimon UDTFs#324
Open
shyjsarah wants to merge 2 commits into
Open
feat(python): expose register_table_function for Paimon UDTFs#324shyjsarah wants to merge 2 commits into
shyjsarah wants to merge 2 commits into
Conversation
Add `SQLContext.register_table_function(name, default_database=None)` to the Python binding so Paimon table-valued functions can be registered from Python — the binding previously had no way to reach `register_udtf`. A single dispatch method keeps the API surface stable: it currently supports `vector_search` and `full_text_search`, and the same `match` will pick up `referenced_files_size` / `physical_files_size` once those land, without changing the Python signature. The function binds to the current catalog. So the binding can obtain that catalog without keeping a duplicate handle of its own, `SQLContext::current_catalog` is made public. The binding also enables the `fulltext` feature so `register_full_text_search` is available. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add tests for `SQLContext.register_table_function`: - vector_search / full_text_search register without error - the optional default_database keyword is accepted - an unknown function name raises a clear error - calling it before any catalog is registered raises Registration alone touches neither the Lumina nor Tantivy runtime, so these tests are deterministic and need no index fixtures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JingsongLi
requested changes
May 18, 2026
Contributor
JingsongLi
left a comment
There was a problem hiding this comment.
We should register it in Catalog by default in Rust. This is a legacy work from before. Can you modify it?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #xxx
paimon-datafusionprovidesregister_vector_search/register_full_text_searchto register table-valued functions (UDTFs) on a session. But the Python bindingPySQLContextonly exposedregister_catalog/set_current_*/register_batch/sql— there was no way to reachregister_udtffrom Python, so these UDTFs were entirely unusable frompypaimon.This PR exposes a single registration entry point to the Python binding.
Brief change log
bindings/python/src/context.rs: addSQLContext.register_table_function(name, default_database=None). A single dispatch method (rather than one method per function) keeps the Python API surface stable — itmatches on the function name, currently handlingvector_searchandfull_text_search, and raises a clearValueErrorfor an unknown name. The function is bound to the current catalog.crates/integrations/datafusion/src/sql_context.rs: changeSQLContext::current_catalogfrom private topub. The binding needs the registeredArc<dyn Catalog>to pass toregister_*; exposing the accessor lets it read fromSQLContextinstead of keeping a duplicate catalog handle.bindings/python/Cargo.toml: enable thefulltextfeature onpaimon-datafusion(pulls intantivy+tempfile, both pure-Rust) soregister_full_text_searchis compiled into the binding.Once
register_referenced_files_size/register_physical_files_sizeland onmain, wiring them is a two-line addition to thematch— the Python signature does not change.Tests
bindings/python/tests/test_datafusion.py— 5 new tests:vector_search/full_text_searchregister without errordefault_databasekeyword is acceptedRegistration touches neither the Lumina nor the Tantivy runtime, so the tests are deterministic and need no index fixtures.
API and Format
SQLContext.register_table_function.SQLContext::current_catalog(previously private).paimon-datafusion/fulltext(addstantivy).Documentation
New Python-facing API. The Rust-side
docs/src/sql.mdalready documents the underlyingregister_*functions; the pypaimon-facing docs live in theapache/paimonrepo and can be updated as a follow-up.